Bilingual Insights into the Initial Lexicon

The Role of Cognates in Word Acquisition

Gonzalo Garcia-Castro

PhD Defence / Departament de Medicina i Ciències de la Vida

2024-11-03


The initial lexicon

Average 20-year-old knows ~42,000 lemmas: mental lexicon

Lexical representations
Phonological, conceptual, grammatical information of known words

First lexical representations at 6-9 months

Normative trajectories of lexical development

Vocabulary size norms for 51,800 monolingual children learning 35 distinct languages (Frank et al. 2017)

Bilinguals face additional challenges, but do not lag behind



Increased complexity in linguistic context

Reduced linguistic input in each language

Increased referential ambiguity

Two overlapping codes

Split into two languages

> 2 labels per referent

Bilinguals face additional challenges, but do not lag behind

Hoff et al. (2012): bilinguals acquire words at similar rates as monolinguals

Lexical similarity modulates vocabulary growth in bilinguals

Floccia et al. (2018): CDI responses of 372 bilinguals learning English + additional language

Lexical similarity: Average phonological similarity (Levenshtein) between pairs of translations


Higher lexical similarity, larger vocabulary size

Stronger effect in the additional language (e.g., Dutch, Mandarin)

Lexical similarity modulates vocabulary acquisition in bilinguals

A cognate facilitation in lexical acquisition?

Cognates: Phonologically-similar translation equivalents

Cognate Non-cognate
[cat] /ˈgat-ˈga.to/ [dog] /ˈgos-ˈpe.ro/

Some evidence that cognates acquired earlier than non-cognates (Mitchell, Tsui, and Byers-Heinlein 2023; Bosch and Ramon-Casas 2014)


What mechanisms support a cognate facilitation during word acquisition?

Lexical access is language non-selective in bilinguals

The present dissertation

Study 1

  1. Provide a mechanistic account for the cognateness facilitation
  2. Test predictions of the model

Under review in Child Development (R2),

Study 2

  1. Test core assumption of the model: Language non-selectivity in the initial lexicon

In preparation

Study 1

Cognate beginnings to lexical acquisition: The AMBLA model

Accumulator Model of Bilingual Lexical Acquisition (AMBLA)

  1. Accumulation of information about form-meaning mappings:

Learning instances: Exposure to a word-form that results in the accumulation of information about its meaning

  1. Age of acquisition: The infant accumulates a threshold amount of learning instances for a word-form

\[ \begin{aligned} \definecolor{myred}{RGB}{ 168, 0, 53 } \definecolor{myblue}{RGB}{ 0, 64, 168 } \definecolor{mygreen}{RGB}{0, 168, 87} \definecolor{grey}{RGB}{128, 128, 128} \textbf{For participant } &i \textbf{ and word-form } j \text{ (translation of } j'): \\ {\color{mygreen}\text{Age of Acquisition}_{ij}} &= \{\text{Age}_i \mid {\color{myred}\text{Learning instances}_{ij}} = {\color{myblue}\text{Threshold}} \}\\ {\color{myred}\text{Learning instances}_{ij}} &= \text{Age}_i \cdot \text{Freq}_j \\ \end{aligned} \]

AMBLA: Simulating monolingual word acquisition

Catalan monolingual child

  • /’gos/ (Catalan), 100%

Parameters:

\[ \begin{aligned} \text{Threshold} = 300 \\ \text{Freq}_j \sim \text{Poisson}(\lambda = 50) \end{aligned} \]

AMBLA: Simulating monolingual word acquisition

Catalan monolingual child

  • /’gos/ (Catalan), 100%

Parameters:

\[ \begin{aligned} \text{Threshold} = 300 \\ \text{Freq}_j \sim \text{Poisson}(\lambda = 50) \end{aligned} \]

AMBLA: Simulating bilingual word acquisition

  1. Linguistic input divided into two languages: Catalan 60%, Spanish 40%

Exposure: Proportion of time exposed to the language of \(j\) word

Accumulation of learning instances, a function of Exposure and Frequency.

\[ \begin{aligned} \textbf{For participant } &i \textbf{ and word-form } j \text{ (translation of } j'): \\ \text{Age of Acquisition}_{ij} &= \{\text{Age}_i \mid \text{Learning instances}_{ij} = \text{Threshold} \}\\ \text{Learning instances}_{ij} &= \text{Age}_i \cdot \text{Freq}_j \cdot {\color{myred}\text{Exposure}_{ij}}\\ \end{aligned} \]

AMBLA: Simulating bilingual word acquisition

Parameters:

Catalan monolingual child

  • /’gos/ (Catalan), 100%

Catalan/Spanish bilingual child

  • /’gos/ (Catalan), 60%

  • /’pe.ro/ (Spanish), 40%

\[ \begin{aligned} \text{Threshold} = 300 \\ \text{Freq}_j \sim \text{Poisson}(\lambda = 50) \end{aligned} \]

AMBLA: Simulating bilingual word acquisition

Catalan monolingual child

  • /’gos/ (Catalan), 100%

Catalan/Spanish bilingual child

  • /’gos/ (Catalan), 60%

  • /’pe.ro/ (Spanish), 40%

Parameters:

\[ \begin{aligned} \text{Threshold} = 300 \\ \text{Freq}_j \sim \text{Poisson}(\lambda = 50) \end{aligned} \]

AMBLA: Simulating a cognate facilitation

  1. Words may accumulate additional learning instances from the co-activation of their (phonologically similar) translation equivalent

Degree proportional to their phonological similarity (Cognateness)

\[ \begin{aligned} \textbf{For participant } &i \textbf{ and word-form } j \text{ (translation of } j'): \\ \text{Age of Acquisition}_{ij} &= \{\text{Age}_i \mid \text{Learning instances}_{ij} = \text{Threshold} \}\\ \text{Learning instances}_{ij} &= \text{Age}_i \cdot \text{Freq}_j \cdot \text{Exposure}_{ij} + \\ &({\color{myred}\text{Learning instances}_{ij'} \cdot {\text{Cognateness}}_{j}})\\ \textbf{where:} \\ {\color{myred}\text{Cognateness}_{j,j'}}&{\color{myred} = \text{Levenshtein}(j, j')} \end{aligned} \]

AMBLA: Simulating a cognate facilitation

Catalan monolingual child

  • /’gat/ (Catalan), 100%

Catalan/Spanish bilingual child

  • /’gat/ (Catalan), 60%

  • /’ga.to/ (Spanish), 40%

\[ \begin{aligned} \text{Threshold} = 300 \\ \text{Freq}_j \sim \text{Poisson}(\lambda = 50) \\ \text{Cognateness}_{j,j'} = 0.75 \end{aligned} \]

AMBLA: Simulating a cognate facilitation

Catalan monolingual child:

  • /’gat/ (Catalan), 100%

Catalan/Spanish bilingual child:

  • /’gat/ (Catalan), 60%

  • /’ga.to/ (Spanish), 40%

\[ \begin{aligned} \text{Threshold} = 300 \\ \text{Freq}_j \sim \text{Poisson}(\lambda = 50) \\ \text{Cognateness}_{j,j'} = 0.75 \end{aligned} \]

Predictions

  1. Cognates acquired earlier than non-cognates
  2. Cognateness facilitation stronger in the lower-exposure language

Predictions

  1. Cognates acquired earlier than non-cognates
  2. Cognateness facilitation stronger in the lower-exposure language

Barcelona Vocabulary Questionnaire (BVQ)


  • Online, open source
  • \(\approx\) 1,600 words (800 Cat., 800 Spa.)
  • 4 sublists, random allocation

Results: Comprehension

Ordinal, multilevel (Bayesian) regression model

366 children (12-32 mo), 436 administrations \(\times\) 604 noun words

\(p(\text{Comprehension}, \text{Production}) \sim \text{Exposure}_{ij} \cdot \text{Cognateness}_j\)

Results: Production

Ordinal, multilevel (Bayesian) regression model

366 children (12-32 mo), 436 administrations \(\times\) 604 noun words

\(p(\text{Comprehension}, \text{Production}) \sim \text{Exposure}_{ij} \cdot \text{Cognateness}_j\)

Discussion

Earlier acquisition for cognates vs. non-cognates

Cognate facilitation moderated by exposure

Only words from the lower exposure benefit from cognateness

Cognateness as a candidate mechanism underlying Floccia et al.’s results

Cross-language facilitation via co-activation of phonologically similar translation equivalents

Is language-non selectivity already present in the initial lexicon?

Study 2

Developmental trajectories of bilingual spoken word recognition

Language non-selectivity in the initial lexicon

Some evidence in infants and children (e.g., Von Holzen and Mani 2012; Singh 2014)

Methodological pitfalls: “Bilingual” task

Implicit naming task

Mani and Plunkett (2010, 2011)

Implicit naming task

Mani and Plunkett (2011):

  • Chance-level target looking in related trials
  • Prime-Target phonological interference
  • Implicit naming of prime pictures

Study 2: Design

Study 2: Design

Extending the task to test cross-language priming in bilinguals.

Change in order of trial timecourse:


Auditory label before target-distractor images

Length of Catalan and Spanish words

Temporal proximity of prime and target labels

Predictions and dataset

Exp. 1: Monolinguals

Replicate within-language phonological interference from Mani and Plunkett (proof of concept)

Exp. 2: Monolinguals and bilinguals

If language non-selectivity, stronger interference in cognate vs. non-cognate trials

79 English monolinguals

89 sessions

77 Catalan/Spanish monolinguals

107 sessions

78 Catalan/Spanish bilinguals

133 sessions

Experiment 1: Results, Bayesian GAMMs

English monolinguals

No evidence of phonological priming

Related trials \(\approx\) Unrelated trials

Experiment 2: Results, Bayesian GAMMs

Catalan/Spanish monolinguals

No evidence of phonological priming

Related trials \(\approx\) Unrelated trials Cognate trials \(\approx\) Non-cognate trials

Experiment 2: Results, Bayesian GAMMs

Catalan/Spanish bilinguals

No evidence of phonological priming

Related trials \(\approx\) Unrelated trials Cognate trials \(\approx\) Non-cognate trials

Discussion


Successful spoken word recognition across ages and language profiles

No evidence of priming effects, within or across languages

Unsuccessful retrieval of prime phonological forms?

Inconclusive results, revise design

General discussion

Summary

Cognateness facilitates word acquisition in the lower-exposure language

Candidate mechanism behind bilingual vocabulary growth

AMBLA: Cross-language accumulation of learning instances

Language non-selectivity in the initial lexicon: Pending testing

Theoretical contributions

Whats not in this dissertation

  • semantic-inhibition: Data collection ongoing
  • translation-elicitation: In preparation

Methodological contributions

Methods

  • Sample size (N = XX)
  • Bayesian modelling: Quantifying uncertainty, estabilising statistical inference

Software

Barcelona Vocabulary Questionnaire (BVQ)

bvq package +

Levenshtein distance as a valid measure of word-level effects of phonological similarity

jtracer package

Future steps

  • The impact of cognateness in spoken word recognition: Re-analysing data from Study 2
  • Disentangling lexical similarity from phonemic overlap: Basque, Greek
  • Bilingualism and concept lexicalisation: Backward Semantic Inhibition

Thanks!

Appendix

Introduction: Bilingualism

Classification of participants into monolinguals an bilinguals

Introduction: Cognate contents in the aggregated vocabulary

Cognate contents in the aggregated vocabulary

Study 1: Posterior regression coefficients

Aggregated vocabularies might conceal facilitation effects

Study 1: MCMC convergence (\(\hat{R}\))

MCMC convergence for the model in Study 1

Study 2: Predictions

  • Successful spoken word recognition across groups
  • If language non-selectivity, stronger interference in cognate vs. non-cognate trials

Study 2: Vocabulary size

Study 2 participant receptive vocabulary sizes across ages and language profiles

Study 2: Model convergence (Exp. 1)

MCMC convergence for model in Study 1 (Exp. 1)

Study 2: Model convergence (Exp. 2)

MCMC convergence for model in Study 2 (Exp. 1)

References

Bergelson, Elika, and Daniel Swingley. 2012. “At 69 Months, Human Infants Know the Meanings of Many Common Nouns.” Proceedings of the National Academy of Sciences 109 (9): 3253–58. https://doi.org/10.1073/pnas.1113380109.
Bosch, Laura, and Marta Ramon-Casas. 2014. “First Translation Equivalents in Bilingual Toddlers’ Expressive Vocabulary: Does Form Similarity Matter?” International Journal of Behavioral Development 38 (4): 317–22. https://doi.org/10.1177/0165025414532559.
Fenson, Larry, Philip S Dale, J Steven Reznick, Elizabeth Bates, Donna J Thal, Stephen J Pethick, Michael Tomasello, Carolyn B Mervis, and Joan Stiles. 1994. “Variability in Early Communicative Development.” Monographs of the Society for Research in Child Development 59 (5): 1–185. https://doi.org/10.2307/1166093.
Frank, Michael C., Mika Braginsky, Daniel Yurovsky, and Virginia A. Marchman. 2017. “Wordbank: An Open Repository for Developmental Vocabulary Data.” Journal of Child Language 44 (3): 677–94. https://doi.org/10.1017/s0305000916000209.
Mitchell, Lori, Rachel Ka-Ying Tsui, and Krista Byers-Heinlein. 2023. “Cognates Are Advantaged over Non-Cognates in Early Bilingual Expressive Vocabulary Development.” Journal of Child Language, 1–20.
Singh, Leher. 2014. “One World, Two Languages: Cross-Language Semantic Priming in Bilingual Toddlers.” Child Development 85 (2): 755–66. https://doi.org/10.1111/cdev.12133.
Tincoff, Ruth, and Peter W Jusczyk. 1999. “Some Beginnings of Word Comprehension in 6-Month-Olds.” Psychological Science 10 (2): 172–75. https://doi.org/10.1111/1467-9280.00127.
Von Holzen, Katie, and Nivedita Mani. 2012. “Language Nonselective Lexical Access in Bilingual Toddlers.” Journal of Experimental Child Psychology 113 (4): 569–86. https://doi.org/10.1016/j.jecp.2012.08.001.